33 research outputs found

    The scoring of poses in protein-protein docking: current capabilities and future directions

    Get PDF
    BACKGROUND: Protein-protein docking, which aims to predict the structure of a protein-protein complex from its unbound components, remains an unresolved challenge in structural bioinformatics. An important step is the ranking of docked poses using a scoring function, for which many methods have been developed. There is a need to explore the differences and commonalities of these methods with each other, as well as with functions developed in the fields of molecular dynamics and homology modelling. RESULTS: We present an evaluation of 115 scoring functions on an unbound docking decoy benchmark covering 118 complexes for which a near-native solution can be found, yielding top 10 success rates of up to 58%. Hierarchical clustering is performed, so as to group together functions which identify near-natives in similar subsets of complexes. Three set theoretic approaches are used to identify pairs of scoring functions capable of correctly scoring different complexes. This shows that functions in different clusters capture different aspects of binding and are likely to work together synergistically. CONCLUSIONS: All functions designed specifically for docking perform well, indicating that functions are transferable between sampling methods. We also identify promising methods from the field of homology modelling. Further, differential success rates by docking difficulty and solution quality suggest a need for flexibility-dependent scoring. Investigating pairs of scoring functions, the set theoretic measures identify known scoring strategies as well as a number of novel approaches, indicating promising augmentations of traditional scoring methods. Such augmentation and parameter combination strategies are discussed in the context of the learning-to-rank paradigm

    IRaPPA: information retrieval based integration of biophysical models for protein assembly selection

    Get PDF
    Motivation: In order to function, proteins frequently bind to one another and form 3D assemblies. Knowledge of the atomic details of these structures helps our understanding of how proteins work together, how mutations can lead to disease, and facilitates the designing of drugs which prevent or mimic the interaction. Results: Atomic modeling of protein-protein interactions requires the selection of near-native structures from a set of docked poses based on their calculable properties. By considering this as an information retrieval problem, we have adapted methods developed for Internet search ranking and electoral voting into IRaPPA, a pipeline integrating biophysical properties. The approach enhances the identification of near-native structures when applied to four docking methods, resulting in a near-native appearing in the top 10 solutions for up to 50% of complexes benchmarked, and up to 70% in the top 100. Availability and Implementation: IRaPPA has been implemented in the SwarmDock server ( http://bmm.crick.ac.uk/ approximately SwarmDock/ ), pyDock server ( http://life.bsc.es/pid/pydockrescoring/ ) and ZDOCK server ( http://zdock.umassmed.edu/ ), with code available on request. Contact: [email protected]. Supplementary information: Supplementary data are available at Bioinformatics online

    Updates to the Integrated Protein–Protein Interaction Benchmarks: Docking Benchmark Version 5 and Affinity Benchmark Version 2

    Get PDF
    We present an updated and integrated version of our widely used protein–protein docking and binding affinity benchmarks. The benchmarks consist of non-redundant, high-quality structures of protein–protein complexes along with the unbound structures of their components. Fifty-five new complexes were added to the docking benchmark, 35 of which have experimentally measured binding affinities. These updated docking and affinity benchmarks now contain 230 and 179 entries, respectively. In particular, the number of antibody–antigen complexes has increased significantly, by 67% and 74% in the docking and affinity benchmarks, respectively. We tested previously developed docking and affinity prediction algorithms on the new cases. Considering only the top 10 docking predictions per benchmark case, a prediction accuracy of 38% is achieved on all 55 cases and up to 50% for the 32 rigid-body cases only. Predicted affinity scores are found to correlate with experimental binding energies up to r = 0.52 overall and r = 0.72 for the rigid complexes.Peer ReviewedPostprint (author's final draft

    Characterizing Changes in the Rate of Protein-Protein Dissociation upon Interface Mutation Using Hotspot Energy and Organization

    Get PDF
    <div><p>Predicting the effects of mutations on the kinetic rate constants of protein-protein interactions is central to both the modeling of complex diseases and the design of effective peptide drug inhibitors. However, while most studies have concentrated on the determination of association rate constants, dissociation rates have received less attention. In this work we take a novel approach by relating the changes in dissociation rates upon mutation to the energetics and architecture of hotspots and hotregions, by performing alanine scans pre- and post-mutation. From these scans, we design a set of descriptors that capture the change in hotspot energy and distribution. The method is benchmarked on 713 kinetically characterized mutations from the SKEMPI database. Our investigations show that, with the use of hotspot descriptors, energies from single-point alanine mutations may be used for the estimation of off-rate mutations to any residue type and also multi-point mutations. A number of machine learning models are built from a combination of molecular and hotspot descriptors, with the best models achieving a Pearson's Correlation Coefficient of 0.79 with experimental off-rates and a Matthew's Correlation Coefficient of 0.6 in the detection of rare stabilizing mutations. Using specialized feature selection models we identify descriptors that are highly specific and, conversely, broadly important to predicting the effects of different classes of mutations, interface regions and complexes. Our results also indicate that the distribution of the critical stability regions across protein-protein interfaces is a function of complex size more strongly than interface area. In addition, mutations at the rim are critical for the stability of small complexes, but consistently harder to characterize. The relationship between hotregion size and the dissociation rate is also investigated and, using hotspot descriptors which model cooperative effects within hotregions, we show how the contribution of hotregions of different sizes, changes under different cooperative effects.</p></div

    Relationship between experimental <i>ΔΔG</i>, Δlog<sub>10</sub>(<i>k<sub>off</sub></i>), Δlog<sub>10</sub>(<i>k<sub>on</sub></i>) and change in interface hotspot energy (<i>Int_HS_Energy</i>) for 713 mutations in SKEMPI.

    No full text
    <p>(A) Shows PCC between experimental <i>ΔΔG</i> with the respective Δlog<sub>10</sub>(<i>k<sub>off</sub></i>) and Δlog<sub>10</sub>(<i>k<sub>on</sub></i>) for single-point alanine, single-point non-alanine, multi-point and all 713 mutations. (B) Shows PCC between <i>Int_HS_Energy</i> with the respective <i>ΔΔG</i>, Δlog<sub>10</sub>(<i>k<sub>off</sub></i>) and Δlog<sub>10</sub>(<i>k<sub>on</sub></i>) for single-point alanine, single-point non-alanine, multi-point and all 713 mutations. Experimental values for the 713 mutations used here are extracted from SKEMPI <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003216#pcbi.1003216-Moal2" target="_blank">[41]</a> and are presented in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003216#pcbi.1003216.s001" target="_blank">Dataset S1</a>.</p

    Detection of rare complex stabilizing mutations using off-rate classification models.

    No full text
    <p>(A) Ranked list of 31 stabilizing mutations (Δlog<sub>10</sub>(<i>k<sub>off</sub></i>) <−1) in SKEMPI off-rate dataset. The list is ranked according to the number of off-rate prediction classification models that detect the mutation in question as stabilizing. Detections per model (B) are highlighted in white, and non-detections highlighted in black. The lower portion of (A) is dominated by single-point mutations to alanine residues, which suggests that the stabilizing effects of these mutations, as opposed to their more common neutralizing/destabilizing effects, are much harder to characterize.</p

    Off-rate prediction models using hotspot and molecular descriptors.

    No full text
    <p>A number of RF regression and classification models are built using different sets of hotspot and molecular descriptors. The prediction accuracy is also assessed on subsets of mutations defined as data regions. The data regions enable us to identify classes of mutations, which are consistently harder to characterize, data set biases and prediction patterns. (A) PCC values for off-rate model predictions with Δlog<sub>10</sub>(<i>k<sub>off</sub></i>). Models use hotspot descriptors, or a combination of hotspot and molecular descriptors. The different methods indicate the hotspot prediction method by which the hotspot descriptors where generated from. (B) Data region analysis of predictions from each model. The prediction from each model are subset into the respective categories shown on the x-axis and values in matrix show PCC achieved by the given model for the given data region. (C) MCC values for off-rate classifier model predictions for classification data sets CDS1 in blue and CDS2 in red. CDS1 includes neutral mutations whereas CDS2 excludes neutral mutations; hence the detection of stabilizing mutants is enhanced in the latter, though results for CDS1 are more relevant for interface design scenarios. (D–F) are similar to (A–C) except that off-rate prediction models using subsets of molecular descriptors are investigated. CP – Coarse-Grain Potentials; AP – Atomic-Based Potentials; CP-AP – All Statistical Potentials; PB – Physics Based Energy Terms. As a benchmark comparison, results for <i>RFSpot_KFC2<sub>Off-Rate</sub></i> (best performing off-rate predictor using hotspot descriptors) and <i>RF_Spot_KFC2<sub>Off-Rate</sub>+MOL</i> (best performing off-rate predictor using hotspot and molecular descriptors) are also included in (D–F).</p
    corecore